I want to measure the salience of three different dimensions of populism in textual data:
I use data collect by Withney Hua, Tarik Abou-Chadi and Pablo Barberá.1 They have downloaded social media posts authored by the Twitter and Facebook accounts used by party leaders and parties in a selected set of Western European countries. Then they recruited crowd workers via a crowd-sourcing platform and asked them to judge a given post along each of the above-mentioned dimensions of populism. This data collection effort has yielded a total 476 judgments on each dimensions generated by three different coders for 185 statements.
I assume that the above-defined dimensions are latent features of political statements, and hence crowd coders act as human content analysts whose judgments I want to aggregate at the statement-level to estimate whether a given statement belongs to either of these three categories.
For each dimension, the setup can be described as a four-tuple \(\langle\mathcal{I}, \mathcal{J}, \mathcal{K}, \mathcal{Y}\rangle\), where
It is provided that \(\mathcal{Y}\) contains at least one judgment per statement, that is, \(|\mathcal{Y}_i| \geq 1\ \forall\ i \in \mathcal{I}\), and that \(|\mathcal{Y}_i| \geq 2\ \forall\ i \in \mathcal{I}' \subset \mathcal{I}\). Moreover, judgments for each statement in \(\mathcal{I}'\) are generated by different coders (i.e., we have repeated annotation at the statement level, but not at the judgment-statement level).
Importantly, while coders’ judgments of statements are observed, true class labels \(l_i \in \mathcal{K}\) are unknown a priori for all \(i = 1,\ \ldots,\ I\). In this setup, a classification of items into classes obtained from a set of judgments \(\rho(\mathcal{Y}) \Rightarrow \mathcal{L}\) is called a ‘ground truth’ or labeling.
I want to fit a Bayesian probabilistic annotation model as described by Bob Carpenter2 in order to estimate the following parameters from the data recorded for the above setup:
\[ \begin{align*} c_i &\sim\ \mbox{Bernoulli}(\pi)\\ \theta_{0j} &\sim\ \mbox{Beta}(\alpha_0 , \beta_0)\\ \theta_{1j} &\sim\ \mbox{Beta}(\alpha_1 , \beta_1)\\ y_{ij} &\sim\ \mbox{Bernoulli}(c_i\theta_{1j} + (1 - c_i)(1 - \theta_{0j}))\\ {}&{}\\ \pi &\sim\ \mbox{Beta}(1,1)\\ \alpha_0/(\alpha_0 + \beta_0) &\sim\ \mbox{Beta}(1,1)\\ \alpha_0+\beta_0 &\sim\ \mbox{Pareto}(1.5)\\ \alpha_1/(\alpha_1 + \beta_1) &\sim\ \mbox{Beta}(1,1)\\ \alpha_1+\beta_1 &\sim\ \mbox{Pareto}(1.5) \end{align*} \] where
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, fig.align = 'center', fig.width = 7, fig.height = 4)
options(digits = 3)
library(rstan)
library(purrr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggridges)
# important: set seed for reproducability reasons
set.seed(1234)
I follow Carpenter’s original setup and simulated 20 annotators over 1000 items with a 50% missingness at random rate, and a prevalence of \(\pi = 0.20\),.
# simulation parameters
n_items <- 1000
n_coders <- 20
missingness_rate <- .5
# data generation parameters
pi <- .2
alpha0 <- 40
beta0 <- 8
alpha1 <- 20
beta1 <- 8
Annotators’ specificity and sensitivity parameters \(\theta_{j0}, \theta_{j1}\) are drawn from \(f_{\text{Beta}}(\alpha_0, \beta_0)\) with \((\alpha_0, \beta_0) = (40, 8)\) and \(f_{\text{Beta}}(\alpha_1, \beta_1)\) with \((\alpha_1, \beta_1) = (20, 8)\), respectively.
The probability density functions (PDFs) of the \(\theta_{j\cdot}\) parameters look as follows:
# sampling distribution annotator-specific parameters
pdf_theta0 <- function(x) dbeta(x, alpha0, beta0)
pdf_theta1 <- function(x) dbeta(x, alpha1, beta1)
par(mfrow = c(1,2))
curve(pdf_theta0, from = 0, to = 1, ylab = "Density", xlab = expression(theta[0]), main = "Beta(40, 8)")
curve(pdf_theta1, from = 0, to = 1, ylab = "Density", xlab = expression(theta[1]), main = "Beta(20, 8)")
Given this setup, we first sample the 1000 instance from a Bernoulli disrtribution with p = .2, and coders’ specificieties and sensitivities.
# sample items' classes
sim_item_classes <- as.integer(rbernoulli(n = n_items, p = pi))
# sample coders' ability parameters
sim_theta0 <- rbeta(n = n_coders, shape1 = alpha0, shape2 = beta0)
sim_theta1 <- rbeta(n = n_coders, shape1 = alpha1, shape2 = beta0)
We can inspect the bivariate distribution of coders’ ‘true’ ability parameters:
plot(
x = sim_theta0, y = sim_theta1
, xlim = c(0,1), ylim = c(0,1)
, xlab = expression(theta[0]), ylab = expression(theta[1])
, cex = .5
)
abline(v = mean(sim_theta0), col = "grey")
abline(h = mean(sim_theta1), col = "grey")
They nicely scatter so that we have different coder types.3
Given these data, we can proceed to generate a set of codings. Note that the below code does not implement the must efficient way to generte these data, but it is a very direct translation of the data generation process expressed in the model specification.
# for each item i
sim_codings <- map_df(1:n_items, function(i){
# sample ~10 coders who judge the item's class
idxs <- sample(1:n_coders, ceiling(n_coders*(1-missingness_rate)))
# for each coder in the sample,
y_i <- map_int(idxs, function(idx) {
# generate coder j's judgment given item i' s 'true' class:
# - if the 'true' class is positive, j classifies it as positive with probability theta_{j1}
# - if the 'true' class is negative, j classifies it as negative with probability theta_{j0}
rbernoulli(1, p = sim_item_classes[i]*sim_theta1[idx] + (1 - sim_item_classes[i])*(1 - sim_theta0[idx]) )
})
# gather data and return
tibble(
# the item index (ID)
item = i
# coders indices (IDs)
, coder = idxs
# simulated judgments
, judgment = y_i
# item i's 'true' class
, true_class = sim_item_classes[i]
)
})
sim_codings
## # A tibble: 10,000 x 4
## item coder judgment true_class
## <int> <int> <int> <int>
## 1 1 10 0 0
## 2 1 1 1 0
## 3 1 6 0 0
## 4 1 12 0 0
## 5 1 3 0 0
## 6 1 2 0 0
## 7 1 19 0 0
## 8 1 4 0 0
## 9 1 7 0 0
## 10 1 14 0 0
## # … with 9,990 more rows
Given the simulated codings, we then first want to check whether the simulated data allows to reconstruct the imput parameters
# inspect coders' performance
coder_performances <- sim_codings %>%
group_by(coder) %>%
summarise(
n_judgments = n_distinct(item)
, accuracy = sum(judgment == true_class)/n_distinct(item)
, tp = sum(judgment == 1 & true_class == 1)
, fn = sum(judgment == 0 & true_class == 1)
, fp = sum(judgment == 1 & true_class == 0)
, tn = sum(judgment == 0 & true_class == 0)
, TPR = tp / (tp + fn)
, TNR = tn / (tn + fp)
) %>%
arrange(desc(accuracy))
We can see that the simulated, i.e., ‘observed’ coder-specific specificities and sensitivities fall in the ranges [0.695, 0.929] and [0.515, 0.825], respectively, and largely overlap the (randomly drawn) ‘true’ but unobserved parameter values. Generally, simulated coders are fairly accurate, resulting in a fairly high corpus-level accuracy:
# inspect confusion matrix
m_confusion <- sim_codings %>%
group_by(true_class, judgment) %>%
summarise(n = n()) %>%
arrange(-true_class, -judgment) %>%
ungroup()
m_confusion
## # A tibble: 4 x 3
## true_class judgment n
## <int> <int> <int>
## 1 1 1 1472
## 2 1 0 608
## 3 0 1 1392
## 4 0 0 6528
m_confusion %>%
summarize(accuracy = sum(ifelse(true_class == judgment, n, 0))/sum(n))
## # A tibble: 1 x 1
## accuracy
## <dbl>
## 1 0.8
The complete JAGS code for the model reads as follows:
model{
for (i in 1:N){
c[i] ~ dbern(pi)
}
for (j in 1:M) {
theta0[j]~ dbeta(alpha0, beta0);
theta1[j]~ dbeta(alpha1, beta1);
}
for (j in 1:J) {
y[j,3] ~ dbern(c[y[j,1]]*theta1[y[j,2]]+(1-c[y[j,1]])*(1-theta0[y[j,2]]))
}
pi ~ dbeta(1,1);
mean0 ~ dbeta(1,1);
scale0 ~ dpar(1.5,1);
alpha0 <- mean0*scale0;
beta0 <- scale0-alpha0;
mean1 ~ dbeta(1,1);
scale1 ~ dpar(1.5,1);
alpha1 <- mean1*scale1;
beta1 <- scale1-alpha1;
}
Let’s walk step by step through the code: We begin with modelling items’ class membership as draws from a Bernoulli distribution governed by the hyperprior pi, the ‘true’ prevalence of the positive class.
for (i in 1:N){
c[i] ~ dbern(pi)
}
The positive-class prevalence, in turn, follows a Beta distribution with mean and shape parameters \(a = b = 1\).
pi ~ dbeta(1,1)
This flat prior reflects our lack of knowledge regarding the prevalence of positive instances in the data.
Next, we draw coders’ specificities and sensitivities, respectively, from two different Beta distributions that are intern governed by different sets of hyperpriors, (alpha0, beta0) and (alpha1, beta1), that are part of the model.
for (j in 1:M) {
theta0[j]~ dbeta(alpha0, beta0);
theta1[j]~ dbeta(alpha1, beta1);
}
The distribution of coder specificities is parametrezized in terms of the mean and scale of the Beta-distribution. Specifically, we put a uniform prior on the mean \(\alpha/(\alpha+\beta)\) and a uniform prior on the inverse square scale \(1/(\alpha+\beta)^2\). As Carpenter explicates, prior on the means is conveniently expressed using a Beta prior with \(a = b = 1\), whereas the flat prior on the inverse square scale is expressed as a Pareto prior with \(\alpha = 1.5, c=1\):
mean0 ~ dbeta(1,1);
scale0 ~ dpar(1.5,1);
alpha0 <- mean0*scale0;
beta0 <- scale0-alpha0;
mean1 ~ dbeta(1,1);
scale1 ~ dpar(1.5,1);
alpha1 <- mean1*scale1;
beta1 <- scale1-alpha1;
Let’s have a look at how the mean and scale parameters are distributed
par(mfrow = c(1,2))
pdf_flat_beta <- function(x) dbeta(x, 1, 1)
curve(
pdf_flat_beta
, from = 0, to = 1
, main = "Prior on the mean of coder-specific ability parameters"
, ylab = "Density"
, xlab = expression(alpha/(alpha + beta))
)
pdf_pareto <- function(x) extraDistr::dpareto(x, 1.5, 1)
curve(
pdf_pareto
, from = 1, to = 20
, main = "Prior on scale of coder-specific ability parameters"
, ylab = "Density"
, xlab = expression((alpha + beta))
)
If we sample 1000 pairs from these distributions and draw from Beta distributions with corresponding parameters, we get the following empirical distribution:
beta_means <- rbeta(1000, 1, 1)
beta_scales <- extraDistr::rpareto(1000, 1, 1)
# we take advanteage of vectorization in R
alphas <- beta_means*beta_scales
betas <- beta_scales-alphas
x <- rbeta(1000, alphas, betas)
hist(x)
Essentially, our prior is pushing coders’ ability parameters to the extremes. This seems to be not a very wise choice, since it reflects a belief that coders are either very capable or mal-intended/adversarial (i.e., judgments negatively correlated to true classes), with mediocre and slightly negatively correlated coders being rare.
Befor fitting the models, we need to create a list containing all the data required by the model.
# data
codings_data <- list()
# number of items
codings_data$N <- length(unique(sim_codings$item))
# number of coders
codings_data$M <- length(unique(sim_codings$coder))
# number of judgements
codings_data$J <- length(sim_codings$judgment)
# judgement data
codings_data$y <- sim_codings %>%
arrange(item, coder) %>%
# columns: item ID, coder ID, judgement
select(item, coder, judgment)
As a baseline fit, I obtained 1000 iterations from 2 chains with 500 burn-in iterations.
model_file_path <- "./models/beta-binomial_by_annotator.jags"
# for obtaining Deviance Information Criterion (DIC)
load.module("dic")
# initialize model with 3 chains
basline_model <- jags.model(
file = model_file_path
, data = codings_data
, n.chains = 3
)
# burn-in for 500 iterations
update(basline_model, 500)
# fit model for 1000 iterations
basline_fit <- coda.samples(
basline_model
, variable.names = c(
# include DIC
"deviance"
# prevalence
, "pi"
# classes
, "c"
# coder ability parameters
, "theta0", "theta1"
# hyperprior governing Beta prior on coder sensitivities
, "alpha0", "beta0"
# hyperprior governing Beta prior on coder specificities
, "alpha1", "beta1"
)
# 100 iterations
, n.iter = 1000
# no thinning
, thin = 1
)
updating and fitting the model took 1.169 minutes on my MacBook Pro with a 3.5 GHz Intel Core i7 Processor using one core.
First we want to see whether chains mix and the model generally converged. Due to the abundance of parameters in our model, we use the deviance information criterion (DIC) to assess these questions.
plot(baseline_fit[, "deviance"])
coda::gelman.plot(baseline_fit[, "deviance"])
As we can see, the model nicely converged already after only a few of iterations.
We alos find only very limited interation in the first and second iteration, so there is apparently no need fro thinning.
coda::autocorr.plot(baseline_fit[, "deviance"], ask = F)
Chains mix nicely, and the model converges quickely
plot(baseline_fit[, "pi"], main = expression(pi))
coda::gelman.plot(baseline_fit[, "pi"], main = expression(pi))
But posterior density has mass in region \([.78, .81]\)!
ggplot(
tibble(x = unlist(baseline_fit[, "pi"]))
, aes(x = x)
) +
geom_density(color = "grey", fill = "lightgrey", alpha = .75) +
scale_x_continuous(limits = c(0,1)) +
geom_vline(xintercept = pi, color = "red") +
theme_bw() +
labs (
title = expression(paste("Marginal posterior density of ", pi))
, subtitle = "Red line indicates 'true' value"
, y = "Density"
, x = ""
)
Since we have 20 posterior density distributions for each \(\theta_0, \theta_1\), we’ll omit visual inspection of chains and autocorrelation and instead rely on the DIC to judge convergence.
Instead we directly plot marginal posterior densities by coder and parameter:
# gives n_coders*n_chains*n_iterations = 60,000 rows
post_thetas <- map_df(1:n_coders, function(j)
tibble(
coder = j
, theta0 = unlist(baseline_fit[, sprintf("theta0[%s]", j)] )
, theta1 = unlist(baseline_fit[, sprintf("theta1[%s]", j)] )
)
)
post_thetas %>%
gather(parameter, value, -coder) %>%
ggplot(
aes(
y = factor(coder)
, x = value
, group = coder
)
) +
geom_density_ridges(
scale = .5
, color = NA
) +
geom_point(
shape = 3
, color = "red"
, data = tibble(
coder = rep(1:n_coders, 2)
, parameter = rep(paste0("theta", 0:1), each = n_coders)
, value = c(sim_theta0, sim_theta1)
)
, aes(y = factor(coder), x = value, group = coder)
) +
scale_x_continuous(limits = c(0,1))+
facet_grid(rows = vars(parameter)) +
labs(
title = "Marginal posterior densities of ability parameters by coder"
, subtitle = "Red crosses mark 'true' values"
, x = ""
, y = "Coder"
, fill = "Parameter"
) +
coord_flip() +
theme_bw()
## Picking joint bandwidth of 0.00756
## Picking joint bandwidth of 0.00338
Just as in the case of prevalence, we can clearly see how ability parameter estimates are consistently flipped relative to the ‘true’ simulation parameter values.
In fact, estimates and true values are negatively correlated as the next plot illustrates:
post_thetas %>%
gather(param, value, -coder) %>%
group_by(coder, param) %>%
summarise(
mean = mean(value)
, q5 = quantile(value, .05)
, q95 = quantile(value, .95)
) %>%
arrange(param, coder) %>%
ungroup() %>%
mutate(true_val = c(sim_theta0, sim_theta1)) %>%
ggplot(aes(x = true_val, y = mean)) +
# geom_smooth(method='lm', formula=y~x, se = FALSE) +
geom_point(size = .5) +
geom_linerange(aes(ymin = q5, ymax = q95), width = .1) +
scale_x_continuous(limits = 0:1) +
scale_y_continuous(limits = 0:1) +
geom_abline(slope = 1, intercept = 0, size = .5, color = "grey") +
facet_grid(~param) +
theme_bw() +
labs(
title = "Mean posterior coder abilities vs. 'true' parameter values."
, subtitle = "Vertical lines overlaying mean values depict 90% credibility intervall, the diagonal indicates perfect correspondence."
, x = "Simulated parameter value"
, y = "Posterior value"
)
The data pulls ability parameters towards low values, but the informative beta priors pull in the opposite direction. We can also see how the more informative (more dispersed) prior on \(\theta_0\) (\(f_{B}(40,8)\) results in higher dispersion of posteriors.
Finally, we can take a look on the model-classification of items relative to their true simulated values.
# gives n_items*n_chains*n_iterations = 1000*3*1000 = 3,000,000 rows
post_c <- map_df(1:n_items, function(i)
tibble(item = i, c = unlist(baseline_fit[, sprintf("c[%s]", i)]))
)
post_c %>%
group_by(item) %>%
summarise(pr_pos = sum(c)/n()) %>%
mutate(est_class = as.integer(pr_pos > .5)) %>%
left_join(tibble(item = 1:n_items, true_class = sim_item_classes)) %>%
mutate(aggreement = true_class == est_class) %>%
summarise(accuracy = sum(aggreement)/n())
## Joining, by = "item"
## # A tibble: 1 x 1
## accuracy
## <dbl>
## 1 0.02
Model-based classifications of items are almost perfectly contradicting true class memberships.
In sum, evaluation of the baseline model suggests that we would want to constraint the model to induce convergence on true values and not on mirroring ones.
Alternatively, post-hoc adjustment may be an option. However, in contrast to classical item response models, we cannot standardize latent class estimates since they follow a Bernoulli distribution.
Also, items’ classes are outside of the context of simulated data unknown, so inducing ‘correct’ convergence
I also tried to induce correct convergence by using the following priors on \(\pi\).
par(mfrow = c(1,3))
curve(dbeta(x*2, 2, 2), 0, 1, main = expression(x%~%f[B](2,2)/2))
curve(dbeta(x*2, 1.5, 1.5), 0, 1, main = expression(x%~%f[B](1.5,1.5)/2))
curve(dbeta(x*2, 1.1, 1.1), 0, 1, main = expression(x%~%f[B](1.1,1.1)/2))
I implemented this in the JAGS code for the second prior density by replacing the line pi ~ dbeta(1,1) by the following code
pi_temp ~ dbeta(1.5, 1.5);
pi <- pi_temp/2;
The idea here is that if \(\pi\) is constrained to range between \([0, .5]\) a priori, the data could not pull the estimate in the range \((.5, 1]\), and as item classes are drawn from \(\text{Bernoulli}(\pi)\), this would induce convergence on the correct parameter values. This did not work as intended, however.
Next, I wondered wether the problem resulted from the weird-shaped prior distribution generated by the hyperpriors \((\alpha_0, \beta_0)\) and \((\alpha_1, \beta_1)\). I thus discarded the multilevel component of the model and simply gave coders’ ability parameters flat priors on the interval [0,1] using a Beta distribution with \(a = b = 1\).
This adapted model then was specified as follows:
model{
for (i in 1:N){
c[i] ~ dbern(pi)
}
for (j in 1:M) {
theta0[j]~ dbeta(alpha0, beta0);
theta1[j]~ dbeta(alpha1, beta1);
}
for (j in 1:J) {
y[j,3] ~ dbern(c[y[j,1]]*theta1[y[j,2]]+(1-c[y[j,1]])*(1-theta0[y[j,2]]))
}
pi ~ dbeta(1,1);
alpha0 <- 1;
beta0 <- 1;
alpha1 <- 1;
beta1 <- 1;
}
However, the results of this exercise were mixed: one of two chains would oscilate around .8, and the other two around .2, or vice versa.
Next, instead of a flar prior on \(\pi\), then I used an informative prior using a Beta distribution with \(a = 1.5, b = 7\) That is, I changed the line pi ~ dbeta(1,1) in the adapted model to pi ~ dbeta(1.5,7). The prior looks like follows:
curve(dbeta(x, 1.5, 7), from = 0, to = 1, ylab = "Density", xlab = expression(theta[0]), main = "Beta(1.5, 7)")
This was the only possible way to obtain correct classifications and posterior densities to center on or around to the simulated values.
init_vals <- lapply(1:3, function(chain) {
out <- list("pi" = rbeta(1, 1, 8))
out[[".RNG.name"]] <- "base::Wichmann-Hill"
out[[".RNG.seed"]] <- 1234
return(out)
})
# load model
model_file_path <- "./models/beta-binomial_by_annotator_adapted.jags"
adapted_model <- jags.model(
file = model_file_path
, data = codings_data
, inits = init_vals
, n.chains = 3
)
update(adapted_model, 500)
adapted_fit <- coda.samples(
adapted_model
, variable.names = c(
"deviance"
, "pi"
, "c"
, "theta0", "theta1"
)
, n.iter = 1000
, thin = 1
)
First we want to see whether chains mix and the model generally converged. We again confie visual insteopction to the DIC.
plot(adapted_fit[, "deviance"])
coda::gelman.plot(adapted_fit[, "deviance"])
Again, the model nicely converges, and very limited autocorrelation in the first and second iteration.
coda::autocorr.plot(adapted_fit[, "deviance"], ask = F)
Chains mix nicely, and the model converges quickely.
plot(adapted_fit[, "pi"], main = expression(pi))
coda::gelman.plot(adapted_fit[, "pi"], main = expression(pi))
And it does so on the simulatred value:
ggplot(
tibble(x = unlist(adapted_fit[, "pi"]))
, aes(x = x)
) +
geom_density(color = "grey", fill = "lightgrey", alpha = .75) +
scale_x_continuous(limits = c(0,1)) +
geom_vline(xintercept = pi, color = "red") +
theme_bw() +
labs (
title = expression(paste("Marginal posterior density of ", pi))
, subtitle = "Red line indicates 'true' value"
, y = "Density"
, x = ""
)
Inspecting the marginal posterior densities of ability parameters by coder and parameter, we also find that the adapted model by and larfge converges on the simulated parameters
# gives n_coders*n_chains*n_iterations = 60,000 rows
post_thetas <- map_df(1:n_coders, function(j)
tibble(
coder = j
, theta0 = unlist(adapted_fit[, sprintf("theta0[%s]", j)] )
, theta1 = unlist(adapted_fit[, sprintf("theta1[%s]", j)] )
)
)
post_thetas %>%
gather(parameter, value, -coder) %>%
ggplot(
aes(
y = factor(coder)
, x = value
, group = coder
)
) +
geom_density_ridges(
scale = .5
, color = NA
) +
geom_point(
shape = 3
, color = "red"
, data = tibble(
coder = rep(1:n_coders, 2)
, parameter = rep(paste0("theta", 0:1), each = n_coders)
, value = c(sim_theta0, sim_theta1)
)
, aes(y = factor(coder), x = value, group = coder)
) +
scale_x_continuous(limits = c(0,1))+
facet_grid(rows = vars(parameter)) +
labs(
title = "Marginal posterior densities of ability parameters by coder"
, subtitle = "Red crosses mark 'true' values"
, x = ""
, y = "Coder"
, fill = "Parameter"
) +
coord_flip() +
theme_bw()
## Picking joint bandwidth of 0.00345
## Picking joint bandwidth of 0.00842
For the fit obtained by the adapted model, estimates and true values are now positively:
post_thetas %>%
gather(param, value, -coder) %>%
group_by(coder, param) %>%
summarise(
mean = mean(value)
, q5 = quantile(value, .05)
, q95 = quantile(value, .95)
) %>%
arrange(param, coder) %>%
ungroup() %>%
mutate(true_val = c(sim_theta0, sim_theta1)) %>%
ggplot(aes(x = true_val, y = mean)) +
# geom_smooth(method='lm', formula=y~x, se = FALSE) +
geom_point(size = .5) +
geom_linerange(aes(ymin = q5, ymax = q95), width = .1) +
scale_x_continuous(limits = 0:1) +
scale_y_continuous(limits = 0:1) +
geom_abline(slope = 1, intercept = 0, size = .5, color = "grey") +
facet_grid(~param) +
theme_bw() +
labs(
title = "Mean posterior coder abilities vs. 'true' parameter values."
, subtitle = "Vertical lines overlaying mean values depict 90% credibility intervall, the diagonal indicates perfect correspondence."
, x = "Simulated parameter value"
, y = "Posterior value"
)
Finally, we can take a look on the model-classification of items relative to their true simulated values.
# gives n_items*n_chains*n_iterations = 1000*3*1000 = 3,000,000 rows
post_c <- map_df(1:n_items, function(i)
tibble(item = i, c = unlist(adapted_fit[, sprintf("c[%s]", i)]))
)
post_c %>%
group_by(item) %>%
summarise(pr_pos = sum(c)/n()) %>%
mutate(est_class = as.integer(pr_pos > .5)) %>%
left_join(tibble(item = 1:n_items, true_class = sim_item_classes)) %>%
mutate(aggreement = true_class == est_class) %>%
summarise(accuracy = sum(aggreement)/n())
## Joining, by = "item"
## # A tibble: 1 x 1
## accuracy
## <dbl>
## 1 0.981
The (unadjusted) classification accuracy is almost perfect.
This is slightly better than majority voting:
sim_codings %>%
group_by(item) %>%
summarise(
n_pos = sum(judgment)
, n_neg = sum(!judgment)
, tie_breaker = rbernoulli(1)
, voted = case_when(
n_pos > n_neg ~ 1L
, n_pos < n_neg ~ 0L
, TRUE ~ as.integer(tie_breaker)
)
, true_class = unique(true_class)
) %>%
ungroup() %>%
summarise(`majority vote: accuracy` = sum(voted == true_class)/n())
## # A tibble: 1 x 1
## `majority vote: accuracy`
## <dbl>
## 1 0.98
Hua, Whitney, Tarik Abou-Chadi, and Pablo Barberá (2018). “Networked Populism: Characterizing the Public Rhetoric of Populist Parties in Europe”. Paper prepared for the 2018 MPSA Conference.↩
Carpenter, Bob (2008). “Multilevel Bayesian Models of Categorical Data Annotation”. unpublished manuscript.↩
That is, individual coders combine different capabilities ranging from low to high specificity as well as from low to high sensitivity, respectively.↩